Modeling the counts of outcomes across multiple categorical trials.
General Principles
To model the relationship between a vector outcome variable in which each element of the vector is a frequency from a set of more than two categories and one or more independent variables, we can use a Multinomial model.
Below is an example code snippet demonstrating a Bayesian multinomial model using the Bayesian Inference (BI) package. This example is based on McElreath (2018).
usingBayesianInference# Setup device------------------------------------------------m =importBI(platform="cpu")# Import Data & Data Manipulation ------------------------------------------------# Importdata_path = m.load.sim_multinomial(only_path =true)m.data(data_path, sep=',')# Define model ------------------------------------------------@BIfunctionmodel(income, career)# Parameter prior distributions alpha = m.dist.normal(0, 1, shape=(2,), name='a') beta = m.dist.half_normal(0.5, shape=(1,), name='b') s_1 = alpha[0] + beta * income[0] s_2 = alpha[1] + beta * income[1]# β οΈ Use jnp.array to create a Python object, so [0] indexing works s_3 = jnp.array([0.0]) p = jnp.exp(jnp.stack([s_1[0], s_2[0], s_3[0]]))# Likelihood m.dist.multinomial(probs = p[career], obs=career)end# Run mcmc ------------------------------------------------m.fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m.summary() # Get posterior distributions
Mathematical Details
We can model a vector of frequencies using a Dirichlet distribution. For an outcome variable Y_i with K categories, the Dirichlet likelihood function is:
Y_i is the outcome (i.e. the vector of frequencies for each k categories) for observation i.
\theta_i is a vector unique to each observation, i, which gives the probability of observing i in category k.
\phi_i give the linear model for each of the k categories. Note that we use the softmax function to ensure that that the probabilities \theta_i form a simplex π.
Each element of \phi_i is obtained by applying a linear regression model with its own respective intercept \alpha_k and slope coefficient \beta_k. To ensure the model is identifiable, one category, K, is arbitrarily chosen as a reference or baseline category. The linear predictor for this reference category is set to zero. The coefficients for the other categories then represent the change in the log-odds of being in that category versus the reference category.
Reference(s)
McElreath, Richard. 2018. Statistical Rethinking: A Bayesian course with examples in R and Stan. Chapman; Hall/CRC.